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Abstract 

This  paper  presents  the  system  adopted  for  the  Faceted  Blog  Distillation  task  by  PRIS  team.  The 
PRIS  system  is  submitted  by  Pattern  Recognition  and  Intelligent  System  Lab  at  Beijing  University 
of  Posts  and  Telecommunications.  And  a  two-stage  strategy  is  involved  for  this  task.  First,  an 
adaptable  Voting  Model  is  carried  out  for  blog  distillation.  Then,  different  models  are  designed  to 
judge  the  facets  and  ranking. 


1  Introduction 

We  participate  in  the  Faceted  Blog  Distillation  task  at  TREC  2010  Blog  Track.  And  the  task  is  the 
same  as  the  Faceted  Blog  Distillation  task  in  2009.  And  three  kinds  of  facets  are  used.  They  are 
'opinionated'  vs.  'factual',  'personal'  vs.  'official'  and  'in-depth'  vs.  'shallow'. 

The  PRIS  system  adopts  a  two-stage  strategy  in  the  faceted  blog  distillation.  The  first  step  is 
baseline  blog  distillation.  This  step  only  consists  in  ranking  blogs  which  are  relevant  to  the  topic. 
An  adaptable  voting  model  with  Posts  Average  algorithm  (PA)  is  designed  for  blog  distillation.  In 
the  second  stage,  different  models  are  used  for  identifying  different  facets.  For  'opinionated'  vs. 
'factual'  facets,  the  opinion  lexicon  and  the  factual  lexicon  are  adopted  for  sentiment  analysis  to 
make  a  distinction  between  these  two  facets.  Then,  an  improved  in-depth  analysis  model  based  on 
the  L-Qtf  (Length-Query  term  frequency)  coefficient  is  carried  out  for  'in-depth'  vs.  'shallow' 
facets.  Meanwhile,  a  personal  lexicon  and  an  official  lexicon  are  generated  by  Information  Gain 
(IG). 

In  Section  2,  we  introduce  the  blog  distillation  algorithm  and  facets  models  respectively.  In 
Section  3,  the  evaluation  of  the  faceted  blog  distillation  system  is  presented.  Finally  in  Section  4, 
conclusions  and  comments  on  the  future  work  are  given. 


2  The  Faceted  Blog  Distillation  System 
2.1  Blog  distillation 

The  aim  of  the  blog  distillation  is  to  identify  blogs  which  have  a  recurring  interest  in  the  query 
topic  area.  In  our  system,  we  use  the  adaptable  Voting  Model  [1]  for  blog  distillation.  In  this 
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model,  blogs  are  ranked  by  learning  the  ranking  of  posts  with  respect  to  the  query.  If  a  blog  has 
many  associated  posts  highly  ranked  in  the  ranking  of  posts,  these  are  seen  as  votes  and  the  blog 
will  be  ranked  higher  than  another  blog  with  less  or  lower  ranked  posts.  In  the  simplest  technique, 
called  Votes,  blogs  are  ranked  by  the  aggregation  of  their  posts  ranked  in  response  to  a  query.  In 
particular,  the  retrieval  score  for  a  blog  B  with  respect  to  a  query  Q,  denoted  Score(B,Q)  is: 

ScoreVote,  (B>  Q )  =  ||^(0  n  POSts(B) |  (1) 

where  R(Q)  is  the  underlying  ranking  of  blog  posts,  and  posts(B)  is  the  set  of  posts  belonging  to 
blog  B.  Note  that  each  post  is  associated  to  exactly  one  blog. 

Our  system  ranks  each  blog  by  the  sum  of  the  relevance  score  of  all  the  retrieved  posts  of  the 
blog,  and  strengthens  the  highly  scored  posts  by  applying  the  root  function  (strong  votes 
evidence): 


Scoreroo,SuJB’Q)=  Yj^score(P’&>  (2) 

peR(Q)r\  posts  (B) 

However,  an  issue  using  such  a  technique  is  that  the  productive  bloggers  may  gain  an  unfair 
advantage  in  the  ranking.  This  is  because  the  more  a  blogger  writes,  the  more  likely  a  query  term 
will  appear  at  random  in  a  blog  post  (e.g.,  many  blog  posts  contain  links  to  other  recent  posts,  with 
the  title  of  each  post  identical  to  the  link  anchor),  and  hence  the  blog  will  receive  extra  erroneous 
votes.  To  this  end,  we  adapt  a  normalization  technique,  called  Posts  Average  algorithm  (PA),  with 
regard  to  the  number  of  posts  of  blog.  The  normalized  score  of  a  blog  is  adapted  as  follows: 


Score  Norm(B,Q) 


J]^score(p,Q) 

peR(Q)nposts(B ) 

\posts{B)\ 


(3) 


Where  | posts  (B)\  denotes  the  total  number  of  posts  of  blog  B. 

Moreover,  query  expansion  is  added  to  our  system  to  improve  the  retrieval  accuracy.  From  the 
aspect  of  topic  understanding,  the  Learning  Query  Expansion  (LQE)  model  based  on 
semi-machine  learning  method  is  designed  as  we  have  done  at  the  Blog  Track  2009  [2], 

We  trained  LQE  model  based  on  CRFs  with  the  manual  Blog  track  2008  queries  which  were 
expanded  based  on  the  human  common  sense  and  comprehension.  After  the  classifier  was  trained, 
it  was  applied  to  the  whole  Blog  track  2010’s  queries  for  query  expansion  which  contains  both 
expansion  words  and  their  weightings  with  Indri  query  language.  One  of  the  final  query  examples 
is  as  the  following: 

<query> 

<number>  1151  </number> 

<text>  information  warfare  #5(information  warfare). (title) 

#weight(1.0  #combine(information  warfare)  0.8  attacks  0.8  cybersecurity 
1.0  cyberwarfare  0.8  information  0.1  warfare  ) 

</text> 

</query> 


2.2  Opinionated  vs.  Factual  Model 

This  model  contains  three  stages.  Firstly,  an  existing  opinion  lexicon  proposed  in  [3]  is  involved  in 


our  system.  For  the  factual  lexicon,  it  is  generated  automatically  in  this  step.  Secondly,  the 
opinionated  lexicon  and  the  newly  generated  factual  lexicon  are  utilized  to  calculate  the  opinion 
score  and  factual  score  respectively.  Finally,  a  ranking  scheme  is  used  to  generate  the  final  ranking 
of  opinionated  and  factual  blogs. 

2.2.1  Generating  a  factual  lexicon 

The  factual  lexicon  is  generated  automatically  based  on  Information  Gain  (IG)  and  Mutual 
Infonnation  (MI).  For  each  term  t  in  blog  posts,  its  IG  weight  is  calculated  as  follows. 

IG(t)  =  P(t)[p(0 1 1)  log^l t)-  +  p(F  1 1)  log^]  +  p(t)[p(0 1 1)  log^  +  P(F  1 1)  Jog*™]  (4) 

P(P)  p(F)  p(0 )  P(F) 

O  and  F  denote  the  opinionated  and  factual  blogs  respectively. 

It  is  assumed  that  the  number  of  opinionated  and  factual  blogs  in  training  collection  is  A  and  B 

respectively.  Then, 


A  +  B 

(5) 

P{0)-/+B 

(6) 

p(o  io=  dmA) 

df(t\A,B ) 

(7) 

-  A  -  df(t  |  A) 

( A  +  B)-df(t\A,B ) 

(8) 

p(t),  p(F),  p(F\t)  and  p(F\t)  can  be  easily  deduced  according  to  the  equations  above.  Terms 


whose  IG  values  are  above  the  threshold,  we  have  previously  set,  are  selected  as  candidates  of  the 
lexicon. 

For  the  candidates  produced  above,  we  further  compute  term  weight  according  to  a 
document-frequency  based  on  the  version  of  the  Mutual  Information  metric  [4]. 


MI{t\  =p(t,F) log  p(*’F)_+p(t,0) log-  P{U<D) 


pit, O') 


p(t)p(F) 

df(t\A) 


p(t)p(0) 


A  +  B 


(9) 

(10) 


p(t,F),  p(t,F)  and  p(t,0)  can  be  easily  deduced  according  to  the  equations  above.  Another 
threshold  is  used  to  generate  the  final  factual  lexicon. 


2.2.2  Computing  opinion  score  and  factual  score 

Given  a  query,  for  each  term  t  in  the  opinion  lexicon  or  factual  lexicon,  we  first  compute  a  tf-idf 
weight  Wff,d/t)  in  the  relevant  document  collection  provided  by  the  baseline  blog  distillation  task. 
Simultaneously,  we  use  a  Bol  tenn  weighting  model  [5]  to  compute  the  query  weight  whoI(q)  in  the 
collection.  Then  we  add  the  two  weight  to  get  the  opinion  score  scoreop  or  factual  score  scorefa. 


2.2.3  Ranking 

First  we  get  the  relevance  score  Score(B,Q)  in  baseline  for  each  blog.  And  then  we  use 
Score (B,Q)X  score op  as  the  final  score  for  ranking  opinionated  blogs  and  Score(B,  Q)X  score fa  as 
the  final  score  for  factual  blogs. 


2.3  In-depth  vs.  Shallow  Model 


In  the  in-depth  facet  stage,  the  improved  in-depth  analysis  model  is  adopted.  The  facet  of  a  blog  is 
judged  based  on  all  the  posts  in  it.  In  common  sense,  an  in-depth  post  expresses  author’s  opinion 
on  the  given  topic  in  detail  with  a  long  length  in  ideal  situation.  For  minimizing  the  impact  of 
spam  contents,  the  length  with  average  length  is  considered  as  a  feature  of  the  in-depth  degree. 
But  only  using  the  length  feature  isn’t  sufficient,  to  confirm  the  relevance  degree,  considering  the 
query  term  frequency  in  the  post  is  also  necessary.  The  posts’  length  and  the  query  tenn  frequency 
are  combined  as  the  following  L-Qtf  coefficient  [2] : 


L-Qtf  =  1 1+ln(i +ln(^W 


teQnD  (1  -  s)  +  s 


dl 

avdl 


(11) 


where  tf  and  qtf  represent  the  query  term  frequency  in  the  post  and  in  the  query  respectively.  The  tf 
and  qtf  are  calculated  after  stemming,  dl  is  the  post  length  and  avdl  is  the  average-length  of  the 
whole  relevant  posts  for  the  topic,  s  is  a  parameter  which  is  set  as  0.2  in  our  experiments.  The 
L-Qtf  coefficient  is  a  kind  of  pivoted  weighting  coefficient  [6]  [7]. 

Based  on  the  whole  posts  of  the  topic-relevant  blogs  given  by  the  blog  distillation,  the  posts  are 
ranked  according  to  the  in-depth  coefficient.  In  the  ranking  list,  the  top  45%  of  topic-relevant 
posts  are  considered  as  the  in-depth,  while  the  last  45%  posts  are  the  shallow.  indepth(post,,Q)  and 
shallow(posti,Q)  represent  the  post  whether  it  is  in  in-depth  or  shallow.  indepth(postitQ)  is  the  total 
number  of  the  in-depth  posts.  If  post,  is  in  the  top  45%  of  ranking  list,  indepthfpost,,  Q)  is  1,  and  0 
otherwise.  Similarly,  shallow(postj,Q)  is  the  total  number  of  the  shallow  posts.  The  in-depth 
degree  (Score)  of  each  blog  is  calculated  according  to  the  relationship  between  the  in-depth  posts 
and  shallow  posts  as  the  following  equation. 

ii  n 

'Yi  indepthfpost t,Q)  -  ^  shaIIow(postj ,  Q ) 

S,=&Ore(Mogt,0)  =  ^ - — -  (‘2) 

n 


,  \  f  1  post,  is  a  indepth  poster 

indepthfpost ,  ,Q)-i 

'  [0  other 

,  .  [  1  post .  is  a  shallow  poster 

shallowfpostj  ,Q)-\ 

[0  other 


(13) 


(14) 


The  larger  the  Score  is,  the  deeper  the  feed  is.  Otherwise,  the  shallower  the  feed  is. 

According  to  the  experiment,  the  ranking  according  to  the  L-Qtf  is  more  effective  in  the 
in-depth  facet,  while  the  shallow  facet  is  more  dependent  both  on  L-Qtf  and  the  length  of  the  post. 

In  the  in-depth  facet  task,  the  feed  should  be  judged  not  only  the  topic  relevance  but  also  the 
facets.  By  considering  these  two  points,  the  combination  model  is  adopted. 


Score{b  log  v ,  Q)  x  Score Norm  ( B ,  Q )  Score(b  log  r  ,Q)>  0 

1  -  Score(b  log  v ,  Q)  x  Score Norm  ( B ,  Q)  Scorefb  log  ,  ,Q)<  0 


(15) 


Sj  is  the  final  confidence  value  of  the  blog  B.  Score (blogx,Q)  is  the  facet  result  as  Eq.(12). 
Score^olmfB,Q)  is  got  from  the  result  of  Blog  distillation.  The  combination  model  use 
multiplication  to  consider  both  topic  relevance  and  facets  result.  According  to  the  experiment,  the 
combine  model  with  multiplication  is  more  effective  than  the  model  with  addition,  as  Eq.  (16)  [2]. 


s,  =  //X  Scored)  log  r ,  Q)  +  (1  -  fi)  x  Score Norm  ( B ,  Q) 


(16) 


(.1  is  a  weighting  parameter  that  distributes  in  the  interval  [0,  1]  and  balances  the  scores  of  facet 
level  and  similarity. 

2.4  Personal  vs.  Official  Model 

For  the  personal  vs.  official  facet,  we  get  Information  Gain  (IG)  values  of  the  terms.  After  that,  we 
extract  the  terms  with  higher  IG  values  to  build  lexicons  with  considering  the  factor  of  sentiment 
at  the  same  time.  Then  the  lexicons  are  used  to  score  and  rank  the  related  blogs  respectively. 

2.4.1  Calculating  IG 

We  calculate  IG  values  of  the  terms  using  the  TREC  Blogs08  collection. 


IG  (t)  =  p(t)[  P(ct  tO  log  P(C,'  IO  +  P(c2  It)  log  p{c'-lt) 


P(c l  ) 


P  (c2  ) 
p  (c2  I  t) 


+  P  ( t)[  p  (  c ,  I  t )  log  P  (c'  1  ‘ }  +  P  ( c  2  I  t )  log 

P  (c, )  p  (c2  ) 


(17) 


where  t  is  the  term  we  want  to  process.  C/  is  the  personal  facet,  and  c?  is  the  official  facet.  The 
essence  of  IG  is  that  the  term  with  larger  IG  value  can  distinguish  the  two  classes.  Then,  we  select 
the  tenns  which  IG  values  are  above  certain  threshold.  Considering  the  factor  of  sentiment,  we 
also  pick  out  the  sentiment  terms  to  improve  the  results. 


2.4.2  Building  lexicons 

In  the  procedure  of  building  the  lexicons  [8],  the  Mutual  Information  metric  is  split  into  two 
parts  to  gain  personal  and  official  facet  weights. 

y  .  , ,  pit,  personal)  -  p(t  official) 

personal)  =  pit,  personal )  log - - — b  p(t,  official)  log - - - —  ( 1 8) 


p{t)p(personaI) 

where p(t,  personal),  p(t)  and  p(personal)  are  defined  as  follows: 

p{t ,  personal)  -  df  {t,  personal)  /  |  R  \ 


p(t)p{official) 


(19) 


p(t)  =  df(t,R)/\R 


(20) 


p(personal )  =  df  ( personal )  / 1  R 


(21) 


where  df(t,  personal)  is  the  number  of  personal  documents  containing  the  term  t.  R  is  the  number 
of  relevant  documents  in  the  collection,  including  personal  and  official  ones. 

The  official  facet  weight  is  calculated  analogously  as  follows: 


official  0 1 )  =  p{t, official )  log  +  p(t,  personal )  log  iffjfffifif 2 

pit)  piofficia  )  pit)  pi  personal ) 


(22) 


2.4.3  Scoring  and  Ranking 

We  use  Vector  Space  Model  (VSM)  to  score  the  blogs  [9],  ipitfipost),  pitfpost), 
p{ty\post)  . .  .p{t\post))  and  {personality ),  personal (t2),  personality)  . .  .personality)  can  represent  a 
post  we  want  to  judge  and  personal  lexicon  respectively.  The  score  of  the  post  belonging  to 
personal  facet  is  calculated  as  follows. 

n 

personal  _  score  (post)  =  ^  p(tt  \  post)  personal  ((;)  (23) 

i=i 

Similarly,  the  official  facet  score  is  calculated  as  follows: 

n 

official  _  score  ( post  )  =  V  p{ti :  |  post  ) official  (ti)  (24) 

i=i 

Since  a  blog  is  comprised  of  many  posts,  its  score  of  personal/official  facet  should  be  the 
addition  of  posts'  personal/official  score.  Finally  the  score  of  a  blog  can  be  as  follows. 

n  n 

personal  _  score  ( post ^  official  _  score  ( post  i ) 
score  (b  log)  =  ^ ’ 

77 

We  rank  this  score  in  descending  order.  Then  from  the  top  to  the  bottom  of  the  ranking  list,  the 
blogs'  inclination  of  personal  facet  becomes  weaker  while  the  inclination  of  official  facet  becomes 
stronger.  Finally,  we  get  100  personal  and  100  official  blogs  with  their  ranking  respectively. 


3  Submission  and  Evaluation  Results 

We  have  done  many  experiments  on  this  track.  In  this  section,  we  present  empirical  evaluation 
results  of  our  different  versions.  We  employed  four  performance  metrics:  mean  average  precision 
(MAP),  binary  preference  (bPref),  rPrec  and  P@10. 

3.1  Blog  distillation 

We  submitted  2  runs.  The  difference  between  the  2  runs  is  query.  The  first  run  is  without  query 
expansion,  the  words  in  the  title  field  are  only  used.  The  second  ran  is  expanded  by  LQE.  The 
evaluation  results  of  the  2  submitted  runs  are  listed  in  Table  1.  Pris  and  Prisb  denote  the 
“query-only”  run  and  the  “query-expansion”  run  respectively.  From  these  data,  it  proves  that  for 
the  first  value  the  LQE  is  effective  while  “query-only”  ran  is  effective  for  the  second  value. 

3.2  Faceted  blog  distillation 

There  are  many  runs  for  this  sub-task,  listing  in  Table2  and  Table3.  Q0  stands  for  the 
“query-only”  ran  and  QE  stands  for  the  “query-expansion”  run.  Std  represents  the  standard 
baseline  1  that  we  used  in  our  system  and  it  means  our  own  baseline  if  there  is  no  “Std”  in  the 
run-tag  label.  PrisQOl,  PrisQEl,  PrisStdQ02,  and  PrisStdQEl  use  both  the  opinion  lexicon  and  the 
factual  lexicon  and  normalization  is  also  adapted,  while  PrisQ02,  PrisQE2,  PrisStdQO  and 


PrisStdQE2  use  the  opinion  lexicon  and  the  normalization  scheme  but  not  the  factual  lexicon.  The 
two  lexicons  are  also  utilized  in  PrisQ03,  PrisStdQ03  and  PrisStdQE3  but  normalization  is  not 
adapted  in  them.  For  PrisQ04  and  PrisStdQ04,  only  the  opinion  lexicon  is  used. 

Table2  shows  that  PrisStdQ02  obtains  the  best  result  for  MAP,  PrisStdQE2  performs  best  at 
bPref.  In  addition,  PrisStdQ02  and  PrisStdQ04  get  the  same  highest  score  for  R-prec  and  P@10. 


Tablel.  Blog  distillation  results 


MAP 

bPref 

R-prec 

P@10 

pris.none 

0.2355 

0.2393 

0.2981 

0.3417 

prisb.none 

0.2210 

0.2271 

0.2885 

0.3250 

pris.  first 

0.1218 

0.0973 

0.1414 

0.1542 

prisb.  first 

0.1296 

0.1056 

0.1466 

0.1292 

pris. second 

0.1668 

0.1391 

0.1731 

0.1500 

prisb. second 

0.1625 

0.1278 

0.1519 

0.1583 

Table  2.  Faceted  blog  distillation  results  of  the  first  value 


MAP 

bPref 

R-prec 

P@10 

PrisQOl 

0.0724 

0.0690 

0.0796 

0.0958 

PrisQ02 

0.0679 

0.0584 

0.0620 

0.0792 

PrisQ03 

0.0674 

0.0635 

0.0750 

0.0917 

PrisQ04 

0.0641 

0.0568 

0.0555 

0.0750 

PrisQEl 

0.0678 

0.0675 

0.0789 

0.0833 

PrisQE2 

0.0732 

0.0692 

0.0743 

0.0750 

PrisStdQO 

0.1037 

0.0879 

0.1031 

0.1083 

PrisStdQ02 

0.1270 

0.1254 

0.1412 

0.1167 

PrisStdQ03 

0.1060 

0.0894 

0.1099 

0.1125 

PrisStdQ04 

0.1265 

0.1264 

0.1412 

0.1167 

PrisStdQEl 

0.0931 

0.0789 

0.0883 

0.1042 

PrisStdQE2 

0.1166 

0.1279 

0.1315 

0.1167 

PrisStdQE3 

0.1137 

0.1258 

0.1315 

0.1125 

Table  3.  Faceted  blog  distillation  results  of  the  second  value 


MAP 

bPref 

R-prec 

P@10 

PrisQOl 

0.0714 

0.0631 

0.0621 

0.0667 

PrisQ02 

0.0451 

0.0525 

0.0623 

0.0500 

PrisQ03 

0.0442 

0.0493 

0.0585 

0.0417 

PrisQ04 

0.0448 

0.0510 

0.0623 

0.0500 

PrisQEl 

0.0791 

0.0813 

0.0926 

0.0708 

PrisQE2 

0.0801 

0.0880 

0.0856 

0.0750 

PrisStdQO 

0.0716 

0.0667 

0.0642 

0.0625 

PrisStdQ02 

0.0956 

0.0975 

0.0999 

0.0583 

PrisStdQ03 

0.0706 

0.0648 

0.0642 

0.0625 

PrisStdQ04 

0.0689 

0.0651 

0.0642 

0.0625 

PrisStdQEl 

0.0947 

0.0952 

0.1066 

0.0708 

PrisStdQE2 

0.0432 

0.0612 

0.0583 

0.0500 

PrisStdQE3 

0.0432 

0.0612 

0.0583 

0.0500 

In  Table3,  the  best  performance  on  MAP,  bPref,  R-prec  and  P@10  is  PrisStdQ02,  PrisStdQ02, 
PrisStdQEl  and  PrisQE2  respectively. 

4  Conclusion 

In  this  paper,  we  present  a  system  for  the  faceted  blog  distillation.  Compared  Table  2  and  Table  3 
with  Table  1,  it  can  be  concluded  that  most  faceted  models  take  negative  feedback  to  the  baseline. 
In  the  feature  research,  we  will  focus  on  exploring  much  more  efficient  faceted  models  for  the 
faceted  blog  distillation. 
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